# Audio Processing

Wav2vec Checkpoints
Apache-2.0
A fine-tuned speech processing model based on facebook/wav2vec2-base, achieving 99.48% accuracy on the evaluation set
Speech Recognition Transformers
W
Zeyadd-Mostaffa
19
0
Distilhubert Finetuned Gtzan 5 Epochs Finetuned Gtzan Finetuned Gtzan
An audio classification model based on the DistilHuBERT architecture, fine-tuned on the GTZAN dataset for music genre classification tasks.
Audio Classification Transformers
D
duysal
5
0
Deepfake Audio Detection
Apache-2.0
A speech processing model further fine-tuned based on wav2vec2-base-finetuned, achieving 98.82% accuracy on the evaluation set
Speech Recognition Transformers
D
motheecreator
1,468
7
Wav2vec2 Base Finetuned
Apache-2.0
A speech processing model fine-tuned based on the facebook/wav2vec2-base model, achieving 99.97% accuracy on the evaluation set
Speech Recognition Transformers
W
mo-thecreator
19
4
Wav2vec2 Base Finetuned
Apache-2.0
A speech processing model fine-tuned based on the facebook/wav2vec2-base model, achieving 99.97% accuracy on the evaluation set
Speech Recognition Transformers
W
motheecreator
105
4
Distilhubert Finetuned Chorddetection
Apache-2.0
A chord detection model fine-tuned based on the DistilHuBERT model, trained on the ChordStimation dataset with an evaluation accuracy of 100%
Audio Classification Transformers
D
alejogil35
14
1
Snoop
Snoop is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, primarily used for voice conversion tasks.
Speech Synthesis Transformers
S
sail-rvc
3,997
0
Ronaldo
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion), which can transform input audio into speech with a specific style.
Speech Synthesis Transformers
R
sail-rvc
3,855
1
Drake RVC
Drake_RVC is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, specifically designed for voice conversion tasks.
Speech Synthesis Transformers
D
sail-rvc
5,043
1
Cardib2333333
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into speech output with a specific style.
Speech Synthesis Transformers
C
sail-rvc
807
1
CJ RVC V2 400 Epochs
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, trained for 400 rounds, suitable for audio-to-audio tasks.
Speech Synthesis Transformers
C
sail-rvc
1,949
0
Andrewtate
This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, which can convert input audio into Andrew Tate's style of voice.
Speech Synthesis Transformers
A
sail-rvc
910
3
Alvin
This is an RVC (Retrieval-based Voice Conversion) model designed for audio-to-audio conversion tasks.
Speech Synthesis Transformers
A
sail-rvc
4,909
0
Audio Cls Unispeech Sat Base 100h Libri Ft Minds14 Finetune
Apache-2.0
A speech classification model fine-tuned on the minds14 dataset based on microsoft/unispeech-sat-base-100h-libri-ft
Audio Classification Transformers
A
jonastokoliu
21
0
Wav2vec2 Base Finetuned Amd
Apache-2.0
This model is a fine-tuned version of facebook/wav2vec2-base on an unknown dataset, primarily used for speech recognition tasks, achieving an accuracy of 84.55% on the evaluation set.
Speech Recognition Transformers
W
justin1983
14
0
Whisper Small Ft Common Language Id
Apache-2.0
A general language identification model fine-tuned based on openai/whisper-small, achieving 88.6% accuracy on the evaluation dataset
Audio Classification Transformers
W
sanchit-gandhi
256.20k
2
Wav2vec2 Base Finetuned Ie
Apache-2.0
A fine-tuned version based on facebook/wav2vec2-base model for specific tasks
Speech Recognition Transformers
W
minoosh
14
0
Wav2vec2 Base Finetuned Ks
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base, achieving an accuracy of 87.27% on the evaluation set.
Speech Recognition Transformers
W
FerhatDk
38
0
Wav2vec2 Base Ft Cv3 V3
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base on the Common Voice 3.0 English dataset, achieving a word error rate of 0.247 on the test set.
Speech Recognition Transformers
W
danieleV9H
120
0
Wav2vec Trained
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 0.1042 on the evaluation set.
Speech Recognition Transformers
W
eugenetanjc
70
0
Resepformer Wsj02mix
Apache-2.0
This is an audio source separation model based on the RE-SepFormer architecture, implemented by SpeechBrain and trained on the WSJ0-2Mix dataset.
Sound Separation English
R
speechbrain
488
3
Wav2vec2 Base Vios Commonvoice 1
Apache-2.0
This model is a speech recognition model fine-tuned on the Common Voice dataset based on facebook/wav2vec2-xls-r-300m, supporting automatic speech recognition tasks.
Speech Recognition Transformers
W
tclong
21
0
Wav2vec2 Final 1 Lm 3
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base, achieving a word error rate of 0.4499 on the evaluation set, which can be reduced to 0.126 when using a 4-Gram language model
Speech Recognition Transformers
W
chrisvinsen
16
0
Wav2vec2 17
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, supporting automatic speech-to-text tasks.
Speech Recognition Transformers
W
chrisvinsen
17
0
Wav2vec2 11
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, supporting automatic speech-to-text tasks
Speech Recognition Transformers
W
chrisvinsen
18
0
Wav2vec2 10
Apache-2.0
A speech recognition model fine-tuned from facebook/wav2vec2-base, achieving a Word Error Rate (WER) of 1.0 on the evaluation set
Speech Recognition Transformers
W
chrisvinsen
20
0
Wav2vec2 5
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, primarily used for Automatic Speech Recognition (ASR) tasks
Speech Recognition Transformers
W
chrisvinsen
20
0
Wav2vec2 4
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, primarily used for automatic speech recognition tasks.
Speech Recognition Transformers
W
chrisvinsen
16
0
Wav2vec2 3
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base with a Word Error Rate (WER) of 1.0
Speech Recognition Transformers
W
chrisvinsen
16
0
Wav2vec2 Base Demo Colab
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 31.42% on the evaluation set
Speech Recognition Transformers
W
brever
16
0
20220517 150219
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m, supporting automatic speech recognition (ASR) tasks.
Speech Recognition Transformers
2
lilitket
29
0
Wav2vec2 Base Timit Demo Colab9
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base on the TIMIT dataset, primarily used for English speech-to-text tasks.
Speech Recognition Transformers
W
hassnain
16
0
Wav2vec2 Base Toy Train Data Augmented
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, optimized with augmented training data.
Speech Recognition Transformers
W
scasutt
22
0
Wav2vec2 Base Cv
Apache-2.0
A speech recognition model fine-tuned on the common_voice dataset based on facebook/wav2vec2-base
Speech Recognition Transformers
W
jiobiala24
24
0
Wav2vec2 Base 1
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base trained on the common_voice dataset
Speech Recognition Transformers
W
jiobiala24
20
0
Wav2vec2 Xls R Tf Left Right Shuru
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-xls-r-300m, achieving a word error rate (WER) of 1.2628 on the evaluation set.
Speech Recognition Transformers
W
hrdipto
29
0
Wav2vec2 Base Checkpoint 6
Apache-2.0
A speech recognition model fine-tuned on the Common Voice dataset based on wav2vec2-base-checkpoint-5
Speech Recognition Transformers
W
jiobiala24
16
0
Wav2vec2 Base Demo Colab
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base, trained on a specific dataset with a word error rate (WER) of 0.3391.
Speech Recognition Transformers
W
asakawa
24
0
Wav2vec2 Base Demo Colab
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained in a Colab environment
Speech Recognition Transformers
W
thyagosme
20
0
Wav2vec2 Base Lj Demo Colab
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, suitable for English speech-to-text tasks.
Speech Recognition Transformers
W
mohamed-illiyas
27
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase